12 - Pattern Recognition [PR] - PR 9 [ID:22075]
50 von 110 angezeigt

Welcome back to pattern recognition. So today we want to look more into the log likelihood

function of the logistic regression and we want to look into how to actually optimize

it and how to find the actual parameter configuration.

So this is our maximization of the log likelihood function. So we've seen that it is concave,

meaning that the negative of the log likelihood function is convex and this is the reason why

we're able to use the Newton-Raphson algorithm to solve this unconstrained optimization problem.

If you want to apply this algorithm then you can see that we can get parameter updates,

simply by starting with a certain initial configuration for theta and then we do updates

on theta. So let's say we are in iteration k, then we get the updated version theta k plus 1 by

computing the inverse of the Hessian matrix of our log likelihood function times the gradient of

the log likelihood function with respect to our parameter vector theta. So you can also write this

up in matrix form and then you essentially end up with a weighted least squares iteration scheme.

Now the question is how do we come up with such a solution scheme and how do we actually

determine this Hessian and the gradient of the likelihood function. Now let's start discussing

the Newton-Raphson iteration. This is essentially motivated by Taylor's theorem and it's, as you

all know, a way of approximating a k times differentiable function f of x and if you have

some initial point x0 then the Taylor approximation tells us that we can essentially approximate this

as f of x0 plus h is equal to f of x0 plus the first derivative of x0 times h plus the second

derivative of the function at the position x0 times h to the power of 2 divided of the factorial

of 2 and you see that we essentially can go on with this until we have the complete Taylor series

which then goes all the way to k and if we compute the limit of h going to 0 you see that this is

going to be exactly 0 so the error term that remains will also go towards 0. So the closer

we go to the initial position the closer our approximation will be and we can now use this

in order to find an optimization strategy. In particular we want to look into the second order

polynomial and the second order polynomial there we stop the expansion after two terms so we have f

of x0 plus f prime of x0 times h plus 1 over 2 times the second derivative of f at the position

x0 times h square. Now we're actually interested in finding h so we can actually look at this

approximation and compute the derivative with respect to h and if we do so you see that the

first term cancels out and we can now see that also the h will go away and the h square will

essentially be converted to h and exactly this is what we find on the next slide. So this is the

derivative of f and you can express this as the derivative of f at the position x0 plus the second

order derivative of the function at the position x0 times h and now we want to find the minimum

of this in order to solve for our strategy so we have to set this to 0. Now if you want to set this

to 0 you can see we can rearrange by subtracting f prime on the left hand side which brings us to

the right hand side with a negative sign and we divide by the second order derivative and this

gives us an estimation of our h hat and now we can reuse h hat and find our x1 so our updated scheme

and this is going to be then simply x0 plus h hat and this can be then written out as x0 minus

the first derivative of the function at x0 divided by the second order derivative of the function

at x0. Now if you bring this to a matrix notation you would see that the right hand term can also

then be rewritten as the Hessian inverse times the gradient of the function. So this is exactly

our Newton-Raphson iteration and let's look at some example where we show how this works.

So we need to initialize at some point so let's pick this point for the initialization.

Then we can set up our quadratic kind of solution scheme and the quadratic solution scheme will

return this function. Now we determine our h hat and h hat is going to be exactly here so this is

our new update then we project this onto the original function and you see then in the next

step we can again produce this fit again determine our h hat and this then will slowly bring us to

the minimum of the function and you see although we are only taking a second order approximation

we will also converge towards a minimum in a function that is actually constructed by x to

the power of four. Okay so this is the scheme that we want to use in order to find our solutions.

Now we still have to look into the different gradients and the Hessian matrix of our problem.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:15:28 Min

Aufnahmedatum

2020-10-28

Hochgeladen am

2020-10-28 15:26:55

Sprache

en-US

In this video, we learn how to estimate the parameters of the logistic function effectively.

This video is released under CC BY 4.0. Please feel free to share and reuse.

For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.

Music Reference: Damiano Baldoni - Thinking of You

Einbetten
Wordpress FAU Plugin
iFrame
Teilen